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ABSTRACT 


In order to verify that a test made of items selected 
from a pool produces the same Rasch scaled achievement scores as 
would be predicted from previous Rasch difficulty calibrations for 
those items, the following experiment was conducted. Seven math tests 
with similar content but graduated difficulty were administered to 
students in grade seven. The tests overlapped each other so that each 
test included about 20 items that were also included in one of the 
other six tests. The items in these seven tests were calibrated for 
difficulty using the Rasch procedure. Through the linking data 
provided by the common items, each of the items in all of the tests 
was adjusted to the same scale of difficulty. These items then formed 
a single pool of items. These items were listed in ascending 
difficulty order and were divided into seven level tests each with 30 
items. These new tests were administered to about 1500 students in 
the seventh grade. Rasch item calibrations were again computed, and 
linking data used to adjust items to a common achievement scale. 
Rasch scaled achievement scores were computed for each possible raw 
score for each of the level tests and for the sane items as 
calibrated in the original pool. The comparable scaled achievement 
scores for the tests were established on tke basis of the actual 
performance of students on the second administration. Tables present 


the results. (RC) 
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One of the more promising uses of the Rasch model is the support it 
can provide in constructing a statistically sound item pool. If the items 
in the pool are screened for consistency in level of difficulty and 
discrimination, the information provided by the Rasch procedure makes it 
possible to choose any subset of items from that pool to form a test and 
develop known statistical characteristics based on previous experience 
with the items. In addition to providing a range of scores which can be 
used to determine the rank order performance of students, the Rasch pro- 
cedure can provide a measure for each student on a performance continuum 
which spans the full range of the item bank. In addition to providing an 
interval level estimate of ability for each raw score, the Rasch procedure 
provides a standard error of the estimate for each score. By developing 


a single underlying scale, it is relatively straightforward to relate the 


performance of different groups to other groups (normative comparisons). 


We have verified experimentally that the Rasch procedure produces an 
equal-interval scale of very high quality and objectivity. We have also 
verified that as long as items represent a consistent content area and are 
R calibrated, we can achieve reliable estimates of achievement level for 
all sets of items drawn from the pool. 

In order to verify that a test made of items selected from a pool 


produces the same Rasch scaled achievement scores as would be predicted 
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from previous Rasch difficulty calibrations for those items, we conducted 
the following experiment. 

In the spring of 1974, seven math tests with similar content but 
graduated difficulty were administered to approximately 1500 students 
in grade 7. The tests overlapped each other so that each test included 
about 20 items that were also included in one of the other six tests. 
These seven tests were designated as forms W, X, Y, Z, Dl, D2, and D3. 
The items in these seven tests were calibrated for difficulty using the 
Rasch procedure. Through the linking data provided by the common items, 
each of the items in all of the tests was adjusted to the same scale of 
difficulty. These items then formed a single pool of items. 

Items in this pool were listed in ascending difficulty order. They 
were then divided into seven level tests each with 30 items (about 13 
uiviaphine tn; 12 deown, and 6 unique to each test). In the spring of 
1975, these new tests were administered to about 1500 students in the 
seventh grade. Rasch calibrations were again computed, and linking data 


used to adjust items to a common achievement scale. 


Rasch scaled achievement scores were computed for each possible raw 


score for each of the level tests administered in spring, 1975, and for 
the same items as calibrated in the original pool administered in spring, 
1974. It is important to remember that the original calibrations in 1974 
were based on different groups taking different tests from those used in 
spring, 1975. The comparable scaled achievement scores for the 1975 tests 
were established on the basis of the actual performance of students on the 


second (1975) administration. 
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The results are shown in Tables 1-7. Note that for Level Test #1, 
the number of students taking the easiest level was too small to produce 
reliable results (N"22). Tables l-7 compare Rasch scaled achievement scores 
for Level Tests 2-7, derived from calibrations based on the administration 
of these level tests, with the scaled scores derived from calibrations for 


the same items as administered in the tests that formed the original item 


pool. Also shown are the mean (arbitrarily set at 50) and standard devia- 


tion of the score distributions, significance of difference between means, 
correlations, and discrepancy between the level test scaled achievement 


scores and the corresponding score for the same pool items. 
Figures 1-7 show the correlation plots, again indicating the very 


close correspondence of scaled achievement scores from the two administra- 


tions. 
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